Relaxing the WDO Assumption in Blind Extraction of Speakers from Speech Mixtures

نویسندگان

Włodzimierz Kasprzak

Ning Ding

Nozomu Hamada

چکیده

The time-frequency masking approach in blind speech extraction consists of two main steps: feature clustering in a space spanned over delay-time and attenuation rate, and spectrogram masking in order to reconstruct the sources. Usually a binary mask is generated under the strong W-disjoint orthogonal (WDO) assumption (disjoint orthogonal representations in the frequency domain). In practice, this assumption is most often violated leading to weak quality of reconstructed sources. In this paper we propose the WDO to be relaxed by allowing some frequency bins to be shared by both sources. As we detect instantaneous fundamental frequencies the mask creation is supported by exploring a harmonic structure of speech. The proposed method is proved to be effective and reliable in experiments with both simulated and real acquired mixtures. Keywords—blind source extraction, harmonic frequencies, histogram clustering, spectrogram analysis, speech reconstruction, time-frequency masking, W-disjoint orthogonal.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mokslas – Lietuvos Ateitis

We are developing two crucial improvements on the time-frequency masking approach to the blind speech separation of underdetermined mixtures when processing anechoic and echoic mixtures. First, the proposed method copes with the usually large amount of delay estimation error that appears in a low frequency band. This step generates a restrictive mask for phase delays on the basis of local and g...

متن کامل

Phase Aliasing Correction For Robust Blind Source Separation Using DUET

Degenerate Unmixing Estimation Technique (DUET) is a technique for blind source separation (BSS). Unlike the ICA based BSS techniques, DUET is a time-frequency scheme that relies on the socalled W-disjoint orthogonality (WDO) property of the source signals, which states that the windowed Fourier transforms of different source signals have statistically disjoint supports. In addition to being co...

متن کامل

A Stochastic Speech Model Supporting W-Disjoint Orthogonality

In previous work, we have successfully used an ideal joint sparseness assumption: W-Disjoint Orthogonality (WDO). This assumption, that the time-frequency representations of the sources have disjoint support, is satisfied in an approximate sense by many signals of practical interest, including speech. Here we discuss results derived from a stochastic model of speech signals that justify the WDO...

متن کامل

Blind speech separation of moving speakers in real reverberant environments

In this paper we present a new on-line Blind Signal Separation method capable to separate convolutive speech signals of moving speakers in highly reverberant rooms. The separation network used is a recurrent network which performs separation of convolutive speech mixtures in the time domain, without any prior knowledge of the propagation media, based on the Maximum Likelihood Estimation (MLE) p...

متن کامل

Adaptive Blind Separation of Speech Signals Cocktail Party Problem

In this paper we present an on line adaptive scheme for blind separation of speech signals from their convolutive mixtures This prob lem is often referred as cocktail party problem When multiple speakers speak simultaneously in tele conferencing studio we need to separate out each speaker from their mixtures If mix tures are assumed as instantaneous mixtures then it becomes standard blind sourc...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Relaxing the WDO Assumption in Blind Extraction of Speakers from Speech Mixtures

نویسندگان

چکیده

منابع مشابه

Mokslas – Lietuvos Ateitis

Phase Aliasing Correction For Robust Blind Source Separation Using DUET

A Stochastic Speech Model Supporting W-Disjoint Orthogonality

Blind speech separation of moving speakers in real reverberant environments

Adaptive Blind Separation of Speech Signals Cocktail Party Problem

عنوان ژورنال:

اشتراک گذاری